# Update on focal-plane image processing research

Sabrina E. Kemeny<sup>1</sup>, El-Sayed Eid<sup>2</sup>, Sunetra Mendis and Eric R. Fossum<sup>1</sup>

Department of Electrical Engineering Columbia University, New York, NY 10027

### ABSTRACT

An update on research activities at Columbia University in the area of focal-plane image processing is presented. Two thrust areas have been pursued: image reorganization for image compression and image half-toning. The image reorganization processor is an integration of a 256 x 256 frame-transfer CCD imager with CCD-based circuitry for pixel data reorganization to enable difference encoding for hierarchical image compression. The reorganization circuitry occupies 2 % of the total chip area and is performed using three parallel-serial-parallel (SP<sup>3</sup>) registers, a pixel resequencing block, and a sampling block for differential output. The chip has achieved a CTE of 0.99994 in this new SP<sup>3</sup> architecture, at an output rate of  $83x10^3$  pixels/sec (0.9996 at  $2x10^6$  pixels/sec) and an overall output amplifier sensitivity of  $3.2\mu$ V/electron. The half-toning chip design has been described previously, and consists of a 256 x 256 frame transfer imager, a pipeline register, and comparator circuit. Functional testing of these elements is reported at this time.

### **1. INTRODUCTION**

Charge-coupled device (CCD) technology has found application in image acquisition primarily because of its ability to multiplex charge domain signals with a high degree of fidelity and low power consumption. Integrating additional circuitry on the focal plane to perform signal and/or image processing functions makes sense only when the image acquisition function is not compromised, or when the system requirements are so extreme that parallelism inherent on the focal plane must be exploited [1].

Recently, our group at Columbia University has been investigating focal-plane image processing functions which fall into the former category. In this case, one must choose applications which can be implemented using CCD or CCD-compatible technology [2-4] so that additional, burdensome requirements are not placed on fabrication or operation. Two applications are reported in this work. These are image reorganization for pyramid image compression and image half-toning. The design of the CCD circuits for image reorganization and half-toning have been reported previously [5,6]. It is the purpose of this paper to update prior reports with experimental results obtained during the past year. First, the completion of the investigation of image reorganization processors for pyramid compression is discussed. Second, the results of functional testing of elements of the image half-toning integrated circuit are reported.

## 2. CCD IMAGE REORGANIZATION PROCESSORS

Most image processing tasks are performed on small windows (kernels), such as  $3 \times 3$  or  $5 \times 5$  blocks of pixels, yet conventional imagers are read out in sequential raster scan format such that vertical neighbors are separated by a full row of pixels. In this work five image reorganization ICs (as well as a separate CCD imager) have been designed, fabricated and tested which supply downstream image processing electronics with the appropriate data sequences ( $3 \times 3$  pixel neighborhoods) at video rates. Several approaches to such neighborhood reconstruction, providing both row and pixel reorganization have been implemented on the five ICs. The design and operation of these approaches were reported here last year [5], while this paper will summarize the test results obtained from the fabricated parts.

<sup>&</sup>lt;sup>1</sup> Present address: JPL, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109.

Present address: CID Technologies, Inc., 101 Commerce Blvd., Liverpool, NY 13088

Four of the image reorganization chips enable difference encoding for hierarchical lossless compression. In this algorithm, the difference between the center pixel and each of its eight surrounding neighbors in each  $3 \times 3$  pixel block is formed, requiring reconstruction of  $3 \times 3$  neighborhood blocks with the center pixel output first (see Fig. 1). A fifth chip, not discussed previously, provides access to every  $3 \times 3$  window in the image thus enabling a wide variety of image processing tasks such as convolution or centroiding.

#### 2.1 Implementation

Two basic approaches were taken in the implementation of the image reorganization ICs: integrated and hybrid. In the former, a  $256 \times 256$  buried channel frame transfer image sensor is



Hierarchical Code:

E, E-F, E-D, E-I, E-H, E-G, E-C, E-B, E-A

Fig. 1. Schematic illustration of desired output sequence.

integrated with the image reorganization circuitry. In the latter, a separate image reorganization chip inputs a conventional raster scan image data stream and outputs a reformatted pixel stream.

The integrated architecture, adopted for two of the difference encoder ICs, consists of five major portions as illustrated in Fig. 2. The 256 x 256 three-phase CCD image sensor is adjacent to a 256 x 256 frame-storage array. Following frame transfer, three lines of the image are loaded into neighborhood reconstruction registers by vertical clocking. The three lines are then shifted horizontally to the pixel resequencing section by applying a channel stop bias to the vertical transfer gates. These registers allow both the horizontal and vertical flow of charge as shown schematically in Fig. 3. The structure is referred to as  $SP^3$ , due to the three serial/parallel transfer structures. Unlike previously reported structures for dual (or triple) serial registers for imager multiplexers [e.g., refs. 8-10], this new structure does not require additional implants or polysilicon levels to achieve the SP<sup>3</sup> function.

The pixel resequencer separates the three lines of image data into 3 x 3 blocks of pixels and outputs a serial stream of nine-pixel blocks with the center pixel first. As described in ref. 7, the two integrated ICs utilized two different techniques for pixel resequencing: wire transfer and pixel delay. A photograph showing a portion of



the frame transfer array, the  $SP^3$  structure, and the wire transfer pixel resequencing block is shown in Fig. 4.

The final section is the sampling output block which separates the center pixel from its eight surrounding neighbors, and provides sequential, differential output and off-chip drive capability. This block consists of a first stage source-follower output amplifier (which can be seen in Fig. 4) followed by dual (parallel) sample-and-hold circuits (Fig. 5). The center pixel data are sampled by the upper S/H The eight peripheral pixels from the circuit. neighborhood are sampled by the lower S/H circuit. The S/H circuits are buffered by a matched pair of source-followers with active load transistors, which in turn drive the output pads. Thus the difference in output voltage between the matched circuits is proportional to the difference between the center pixel intensity and each of its surrounding neighbors.

In the hybrid approach, the imager is read out conventionally and each pixel output voltage is used to generate a charge replica of each pixel. In the case of the two hybrid difference encoder ICs, three rows of imager data are sequentially written into three separate CCD shift registers. These registers are then simultaneously read into the pixel resequencing and sampling output block described previously. In the case of the full access windowing chip, each row is replicated nine times and serially read into nine separate serial-parallel-serial CCD registers. The



Fig. 5. Schematic illustration of circuit used in output sampling blocks.



Fig. 4. Microphotograph of IC showing a portion of the frame transfer array, the SP<sup>3</sup> structure, and the wire transfer pixel resequencing block.

pixel data are then delayed by the appropriate amount (1 - 3 rows, 1 - 3 pixels) such that a new 3 x 3 window appears at the nine parallel output ports during every pixel readout period, as illustrated in Fig. 6. Nine output amplifiers dual-stage with on-chip sample-and-hold circuits provide direct oscilloscope drive capability (22 pF load). Operational simplicity is achieved through synchronous charge transfer in the imager and reorganization chips. Specifically, the frame transfer parallel electrodes and transfer gate are tied to the parallel reorganization electrodes, while the serial imager output multiplexer controls the serial input and output reorganization registers. In practice, ten of the 13 control and power supply inputs to the reorganization chip are derived from those of the imager which supplies it with optical data.

All the ICs are implemented using a triple-poly double-metal buried n-channel CCD process. Pixel size is  $15\mu m \times 15\mu m$ . The image and frame store sections occupy 3.9 mm x 7.74 mm, with the remaining



Fig. 6. Schematic diagram of full-access 3 x 3 windowing chip.

was measured to be 0.99994/stage at 83 kpixel/sec and 0.9996/stage at 2 Mpixel/sec. Overall output amplifier sensitivity was measured to be  $3.2\mu$ V/e<sup>-</sup> for the output sampling block amplifier and  $2.5\mu$ V/e<sup>-</sup> for the 3 x 3 windowing chip output amplifiers. Intrinsic read noise levels could not be assessed due to test station noise limitations. Matching of the output amplifier pair (sampling block) was measured to be better than 0.05%, with some chip to chip variation observed. (Mismatch can be corrected using an off-chip preamplifier prior to A/D conversion, if needed.)

Optical testing was performed at a 2 Mpixel/ sec output rate (28 frames/sec). A 28-85 mm Nikon lens was used to focus an image onto the chips. Raw output from the chips was first buffered by a pre-amplifier, which through gain and offset, provided a 0-1.5 volt signal which was then either inverted and sent to a raster scan converter for display or sent to the 3 x 3 windowing chip whose output was similarly buffered and displayed. Although functional, performance for the windowing chip was limited by poor CTE in the serial registers. The problem however, is not intrinsic to circuitry occupying an additional 2% of chip area, or  $0.61 \text{ mm}^2$ .

### **2.2 Experimental Results**

The chips were tested both electrically and optically. The imaging and processing circuitry was operated with 5 volt three-phase clocks, yielding a total estimated dissipated power ranging from approximately  $150\mu$ W for the difference encoding reorganization circuitry to  $410\mu$ W for the 3 x 3 windowing chip, at a 30 Hz frame rate, not including the off-chip drive amplifiers. The amplifiers used in the sampling output block add an estimated 7 mW of power since they were designed to drive an oscilloscope directly (1 M $\Omega$  22 pF load), but in principle need only drive an A/D converter.

The circuits were tested electrically at the wafer-probe and chip level at a 277 kpixel and at a 2 Mpixel/sec output rate respectively. The two integrated chips contained an additional serial-to-parallel charge electrical input structure above the imaging section to facilitate quantitative testing. Charge transfer efficiency in the vertical registers as well as in the conventional horizontal (serial-to-parallel) registers was measured to exceed 0.99996/ stage, and CTE in the horizontal SP<sup>3</sup> registers



Fig. 7. Photograph taken from scan-converter monitor showing IC functionality. Image sensor output when reformatting circuitry is bypassed is larger image. Inset is real-time "edge" image using on-chip reformatting circuitry described in text. the design, since "identical" serial/parallel registers were utilized with complete success in all the other ICs. To demonstrate functionality of the integrated ICs, a photograph taken from the screen of the scan converter during imaging with the wire transfer pixel resequencing IC is shown in Fig. 7. The larger image is a portion of the complete  $256 \times 256$  image captured (at a 26 Hz frame rate) by multiplexing the imager output through the upper SP<sup>3</sup> register and bypassing the pixel resequencing circuitry. The inset image is composed of one of the eight difference-encoded elements (center pixel minus neighboring diagonal pixel) of each 3 x 3 block yielding an 80 x 80 subsampled "edge" image also generated at a 26 Hz frame rate.

#### 2.3 Summary

In summary, functionality of five image reorganization ICs has been demonstrated. Four of these provide real-time neighborhood reconstruction to enable pyramidal, differential output of the image data, thus simplifying downstream electronics and reducing system size, power and weight of lossless hierarchical compression hardware. Two of the ICs integrate a CCD image sensor with the CCD reformatting circuitry. The additional circuitry occupies an additional 2% of chip area and inconsequentially increases IC power dissipation. Signal integrity is not compromised by the structure since charge transfer efficiency is high and the number of transfers is not increased. Thus a first step towards focal-plane data reduction has been realized. The next logical step is the integration of image encoding hardware along with a reorganization preprocessor on the focal-plane, followed by integration of compression hardware in a final generation.

In the future, expansion of the  $3 \times 3$  neighborhood reconstruction window to a larger, perhaps size-adaptive window, would also be a natural extension of this work. Integration of analog processing circuitry with digital control elements would also enhance performance of future focal-plane processing systems.

#### **3. IMAGE HALF-TONER**

Image half-toning is the process of converting a full continuous-tone image into a image consisting of either black or white pixels, i.e. a binary image. While simply thresholding the image using a global threshold level can achieve such a transformation, the quality of the resultant image is typically poor. For better image quality, one seeks an image in which the average pixel intensity in a local neighborhood in the original analog image is equal to the average pixel intensity in the resultant binary image. This leads to a competition between good grey scale reproduction (in the average sense) and high spatial resolution. Algorithms for producing half-tone images have been studied extensively [for example, see Jarvis, et al., ref. 11]. In this work, we have chosen a combination of the error-diffusion algorithm of Floyd and Steinberg [12] and the deterministic interpolation strategy described by Schroeder [13]. The aim is to produce good quality output images utilizing a moderately complex process.

Half-toning is typically driven by image display. For facsimile-type machines, image reproduction on paper is greatly simplified if the image is half-tone, rather than continuous-tone. A low cost video display might also yield better quality images if the image is half-tone. A secondary driver for half-toning is data transmission. If the image is to be ultimately half-tone, there is no need to transmit the continuous-tone image. Considerable data compression can be achieved if the image is transmitted at 1 bit per pixel rather than 8 or 10 bits per pixel. For teleconferencing or in advanced networking stations providing both video and data, such half-tone real-time images may be of sufficient quality to be useful, provided the half-toning can be done in real-time. In this work, we were motivated by such an application and sought a low-cost approach to providing a camera system with a built-in high quality half-toning capability.

The design of the integrated circuit was reported last year at this conference [6], and is described briefly here. The algorithm transforms each analog pixel level to one of the two binary levels (0 or 1) by comparing two quantities. The quantities are formed using a local neighborhood of pixels, as shown in Fig. 8. Since the neighborhood of pixels used in the algorithm precedes the pixel of interest during the readout of the image, they have already been transformed to binary values. One quantity is a weighted sum of the original analog values of the pixels, and the second is a weighted sum (using the same weights) of the binary values of the pixels. Also used in the comparison is the analog value of the pixel of interest and a threshold level. Thus, the local average



Fig. 8. Illustration of local neighborhood used in half-toning IC and applied weighting function.



Fig. 9. Schematic illustration of architecture used to achieve neighborhood reconstruction and to perform half-toning function.

intensity between the original analog image and the transformed binary image is being continuously compared, and the next pixel to be transformed is chosen to cause convergence of the two quantities. In this way, the performance requirements on the comparator are eased since the negative feedback inherent in the algorithm corrects for comparator errors.

#### **3.1 Implementation**

The major observation made from studying the above algorithm is that it requires the reconstruction of both the local analog and binary pixel neighborhoods. A CCD delay line is ideally suited for this and the resultant architecture is that of a pipe organ. The circuit block diagram is shown in Fig. 9. The image acquisition function is performed by a 256 x 256 frame transfer imager. Seven pipe organ registers are required for neighborhood reconstruction. Since a one-pixel delay is introduced in the generation of the binary pixel value, the pipe organ lengths for the binary neighborhood are one pixel shorter than the corresponding registers for the analog neighborhood. Charge is inserted into the registers using a fill-and-spill input stage. The widths of the electrodes used in the input stages determine the weighting values. The input to the binary neighborhood portion of the pipe organ is provided by the output of the comparator, and the voltages used in the input stage are adjusted to insure scale matching between the analog and binary neighborhoods. The threshold level is also generated by a fill-and-spill structure in the charge-domain. The output of the pipe organs are summed and converted to voltage using a common

floating-diffusion output and source-follower stage. The comparator is a cross-coupled flip-flop, with opposite nodes initially set using the sampled output of the source-followers. The flip-flop is then turned on and flips to one of its two states, depending on the initial node voltages.

The design of the half-toner chip allows for independent assessment of the performance of the imager, pipe organ, and comparator, and if necesary, external substitution of either the imager or comparator functions. Circuits containing only the imager, the pipe organ, and the comparator were laid out. A complete half-toner chip was also designed and fabricated. The circuits were implemented using a triple-poly three-phase CCD process. The imager and pipe organ are buried-channel structures. The input fill-and-spill stages are surface-channel structures for improved linear performance. The comparator was chosen to be a surface-channel MOSFET design because of the need for a positive threshold voltage device. A circuit diagram showing the comparator and associated circuitry is shown in Fig. 10.

#### **3.2 Experimental Results**

The components of the half-toner IC were tested independently. At the time of this paper, testing of the fully integrated IC has not been performed. The imager (a slightly different device than that described in the first half of this paper) has a pixel size of  $12\mu m \times 12\mu m$ . The imager was found to be functional but detailed

measurements have not been performed. The impulse response of the pipe organ was similarly tested at 83 kpixels/s. The relative weights are shown in Table 1.

The voltage comparator was characterized using external voltage domain input signals at 0.8 MHz. The comparator was found to have a hysteresis of approximately 50 mV contributing to a signal history dependent offset. The origin of the hysteresis is suspected to be charge trapping in the nitride used in these surface channel devices fabricated in a buried-channel process. The pipe organ output was then fed to the comparator to verify functionality. The pipe organ input was supplied externally from a slowly varying signal generator and the threshold signal was kept constant. Successful operation of the pipe organ/comparator integrated circuit was demonstrated, though limited by the hysteresis described above.

# **3.3 Conclusions**

It is anticipated that the completely integrated circuit will function as designed when it is tested based on the demonstrated functionality of its components. Performance of the IC will be limited by hysteresis in both the surface channel input circuits and in the comparator with the common origin of charge trapping in the nitride. There are no plans at this time for refabricating the ICs and avoiding the use of the nitride process. Nevertheless, we feel we have demonstrated the feasibility of integrating the half-toning circuitry on the focal-plane.

### 4. ACKNOWLEDGMENTS

The authors gratefully acknowledge the support of Loral (Ford Aerospace) in the fabrication of the circuits. In particular, we would like to thank Dr. R. Bredthauer, M. LaShell, and J. Pinter. This work was supported, in part, by the NSF Center for Telecommuncations Research at Columbia University, and an NSF Presidential Young Investigator Award.

# 5. REFERENCES

- 1. E.R. Fossum, "Architectures for Focal-plane Image Processing," Opt. Eng., vol. 28 (8), 865-871 (1989).
- 2. E.R. Fossum, "Charge-Coupled Computing for Focal Plane Image Preprocessing," Opt. Eng., 26 (9), pp. 916-922 (1987).
- 3. E.R. Fossum, "Charge-Domain Analog Signal Processing for Detector Arrays," *Nuclear Instr. and Methods*, vol. A275, pp. 530-535 (1989).
- 4. T.L. Vogelsong and J.J. Tieman, "Charge-Domain Integrated Circuits for Signal Processing," *IEEE J. Solid-State Circ.* SC-20(2), pp. 562-570 (1985).
- 5. S.E. Kemeny, H. Meadows and E.R. Fossum, "Design of a CCD Focal-Plane Codec Preprocessor for Lossless Image Compression," *in Charge-Coupled Devices and Solid-State Sensors, M. Blouke, ed., Proc. SPIE* 1242, pp. 118-125 (1990).
- 6. E-S. Eid and E.R. Fossum, "Design of a CCD Focal-Plane Image Half-Toner," in Charge-Coupled Devices and Solid-State Sensors, M. Blouke, ed., Proc. SPIE 1242, pp. 126-132 (1990).
- 7. S.E. Kemeny, CCD Focal-Plane Image Reorganization Processors, Ph.D. Thesis, Columbia University, NY (1991).
- 8. R. Angle, J. Carnes, W. Kosonocky, and D. Sauer, "Techniques for the Design of High-Density High-Speed TDI-CCD Image Sensors," in Proc. 1978 International Conf. on the Appl. of CCDs, pp. 1-11, San Diego, CA (1978).
- 9. B. Dierickx and J.P. Vermeiren, "A New Very High Resolution Linear Image Sensor Architecture," in Proc. 1986 Electronic Imaging Conf., pp. 114-119 (1986).
- 10. T. Lee, W-C. Chang, W. Miller, G. Torok, K. Wong, B. Burkey, and R. Khosla, "A Four Million Pixel CCD Image Sensor," in Charge-Coupled Devices and Solid-State Sensors, M. Blouke, ed., Proc. SPIE 1242, pp. 10-16 (1990).
- 11. J. Jarvis, C. Judice, and W. Ninke, "A Survey of Techniques for Display of Continuous-tone Pictures on Bi-level Displays," *Computer Graphics and Image Processing* 5, pp. 13-40 (1976).
- 12. R. Floyd and L. Steinberg, "An Adaptive Algorithm for Spatial Greyscale," Proc. SID 17(2), pp. 75-77 (1976).
- 13. M. Schroeder, "Images from Computers," IEEE Spectrum 6(3), pp. 66-78 (1969).



Fig. 10. Comparator circuit diagram. Transistor geometry is also specified.

| Table 1.                         |
|----------------------------------|
| Pipe Organ Weights (Arb. Units)  |
| (Accuracy is $\pm - 0.25$ units) |

| Design Weight | Measured Weight |
|---------------|-----------------|
| 16            | 18              |
| 7             | 9               |
| 5             | 5               |
| 3             | 3               |
| 1             | 1               |